Workshop on Computational Personality Recognition: Shared Task
نویسندگان
چکیده
In the Workshop on Computational Personality Recognition (Shared Task), we released two datasets, varying in size and genre, annotated with gold standard personality labels. This allowed participants to evaluate features and learning techniques, and even to compare the performances of their systems for personality recognition on a common benchmark. We had 8 participants to the task. In this paper we discuss the results and compare them to previous literature. Introduction and Background Personality Recognition (see Mairesse et Al. 2007) consists of the automatic classification of authors’ personality traits, that can be compared against gold standard labels, obtained by means of personality tests. The Big5 test (Costa & MacCrae 1985, Goldberg et al. 2006) is the most popular personality test, and has become a standard over the years. It describes personality along five traits formalized as bipolar scales, namely: 1) Extraversion (x) (sociable vs shy) 2) Neuroticism (n) (neurotic vs calm) 3) Agreeableness (a) (friendly vs uncooperative) 4) Conscientiousness (c) (organized vs careless) 5) Openness (o) (insightful vs unimaginative). In recent years the interest of the scientific community in personality recognition has grown very fast. The first pioneering works by Argamon et al 2005, Oberlander & Nowson 2006 and the seminal paper by Mairesse et al. 2007, applied personality recognition to long texts, such as short essays or blog posts. The current challenges are instead related to the extraction of personality from mobile social networks (Staiano et al 2012), from social network sites (see Quercia et al. 2011, Golbeck et al 2011, Bachrach et al. 2012, Kosinski et al. 2013) and from languages different from English (Kermanidis 2012, Bai et al 2012). There are also many other applications that can take advantage of personality recognition, including social network analysis (Celli & Rossi 2012), recommendation systems (Roshchina et Al. 2011), deception detection (Enos et Al. 2006), authorship attribution (Luyckx & Daelemans 2008), sentiment analysis/opinion mining (Golbeck & Hansen 2011), and others. Copyright c © 2013, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. In the Workshop on Computational Personality Recognition (Shared Task), we invited contributions from researchers or teams working in these areas or other related fields. Despite a growing number of works in personality recognition, it is still difficult to gauge their performance and quality, due to the fact that almost all the scholars working in the field run their experiments on very different datasets, and use very different evaluation procedures (Celli 2013). These problems are exacerbated by the fact that producing gold standard data for personality recognition is difficult and costly. In 2012 there has been a competition on personality prediction from Twitter streaming data1 with about 90 teams participating, thus showing the great interest of the industry and the research community about this field. The Workshop on Computational Personality Recognition (Shared Task) is different from a simple competition, because we do not want to focus just on systems’ performances, but rather we would like to provide a benchmark for discovering which feature sets, resources, and learning techniques are useful in the extraction of personality from text and from social network data. We released two datasets, different in size and domain, annotated with gold standard personality labels. This allowed participants to compare the performance of their personality recognition systems on a common benchmark, or to exploit personality labels for other related tasks, such as social network analysis. In this paper we summarize the results of the Workshop on Computational Personality Recognition (Shared Task), discussing challenges and possible future directions. The paper is structured as follows: in the next section we provide an overview of previous work on personality recognition. Then in the following sections we present the datasets and the shared task, we report and discuss the results, and finally we draw some conclusions.
منابع مشابه
Reports on the 2013 Workshop Program of the Seventh International AAAI Conference on Weblogs and Social Media
The Workshop on Computational Personality Recognition allowed participants to compare the results of their systems on a common benchmark. Unlike competitive shared tasks, the workshop did not focus just on performance, but rather on discovering which feature sets, resources, and learning techniques are useful in the extraction of personality from text. Organizers provided two gold-standard labe...
متن کاملBidirectional LSTM for Named Entity Recognition in Twitter Messages
In this paper, we present our approach for named entity recognition in Twitter messages that we used in our participation in the Named Entity Recognition in Twitter shared task at the COLING 2016 Workshop on Noisy User-generated text (WNUT). The main challenge that we aim to tackle in our participation is the short, noisy and colloquial nature of tweets, which makes named entity recognition in ...
متن کاملShared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition
This paper presents the results of the two shared tasks associated with W-NUT 2015: (1) a text normalization task with 10 participants; and (2) a named entity tagging task with 8 participants. We outline the task, annotation process and dataset statistics, and provide a high-level overview of the participating systems for each shared task.
متن کاملDeepNNNER: Applying BLSTM-CNNs and Extended Lexicons to Named Entity Recognition in Tweets
In this paper, we describe the DeepNNNER entry to The 2nd Workshop on Noisy User-generated Text (WNUT) Shared Task #2: Named Entity Recognition in Twitter. Our shared task submission adopts the bidirectional LSTM-CNN model of Chiu and Nichols (2016), as it has been shown to perform well on both newswire and Web texts. It uses word embeddings trained on large-scale Web text collections together ...
متن کاملNamed Entity Recognition for South and South East Asian Languages: Taking Stock
In this paper we first present a brief discussion of the problem of Named Entity Recognition (NER) in the context of the IJCNLP workshop on NER for South and South East Asian (SSEA) languages1 . We also presents a short report on the development of a named entity annotated corpus in five South Asian language, namely Hindi, Bengali, Telugu, Oriya and Urdu. We present some details about a new nam...
متن کامل